从有限的资源中获得最大收益可以进步自然语言处理(NLP)研究和实践,同时保守资源。这些资源可能是数据,时间,存储或能源。NLP的最新工作从缩放率产生了有趣的结果。但是,仅使用比例来改善结果意味着资源消耗也会扩展。这种关系激发了对有效方法的研究,这些方法需要更少的资源才能获得相似的结果。这项调查涉及NLP效率的方法和发现,旨在指导该领域的新研究人员并激发新方法的发展。
translated by 谷歌翻译
在大型数据集上培训大型神经语言模型是资源和时间密集型的。这些要求造成了进入的障碍,其中资源较少的人无法建立竞争模型。本文介绍了各种技术,以使(a)使用适中的研究实验室可能拥有的资源训练大型语言模型,以及(b)在合理的时间内训练它。我们为从业人员提供具体的建议,我们通过案例研究来说明这一点:丹麦的T5模型,第一种语言。
translated by 谷歌翻译
测量通道间的一致性对于注释任务很重要,但是许多指标需要一个完全注销的数据集(或子集),其中所有注释者都注释了所有样本。我们定义了一致性的稀疏概率,该水疗中心是估计一致性的概率,当时没有所有注释者 - 项目对可用。我们表明,具有一些假设的水疗中心是一个公正的估计器,并为处理不同数量注释的样品提供了多种不同的称重方案,并在一系列数据集上进行了评估。
translated by 谷歌翻译
本文记录了伊图哥本哈根(ITU Copenhagen)生产的法罗伊斯(Faroese)和丹麦(Faroese)之间的句子对数据集。数据涵盖了两种源语言的tranlsation,旨在用作此语言对的机器翻译系统的培训数据。
translated by 谷歌翻译
学习将输入集映射到其元素元素序列上的任务对于神经网络而言是一项挑战。设置到序列问题发生在自然语言处理,计算机视觉和结构预测中,其中大集合元素之间的相互作用定义了最佳输出。模型必须表现出关系推理,处理不同的基础性并管理组合复杂性。以前的基于注意力的方法需要$ n $层的设定转换,以明确表示$ n $ th订单关系。我们的目的是增强他们通过附加相互依赖组件有效地对高阶相互作用进行有效建模的能力。我们提出了一种新型的神经集编码方法,称为“集合相互依赖变压器”,能够将集合的置换不变表示与其在任何基数集合中的元素联系起来。我们将其与置换学习模块结合到一个完整的三部分设定模型中,并在许多任务上演示其最先进的性能。这些范围从组合优化问题,到在合成和已建立的NLP数据集上的置换学习挑战到句子排序的挑战,到产品目录结构预测的新颖领域。此外,研究了网络概括到看不见的序列长度的能力,并提供了对现有方法学习高阶相互作用能力的比较经验分析。
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
With the rise of AI in recent years and the increase in complexity of the models, the growing demand in computational resources is starting to pose a significant challenge. The need for higher compute power is being met with increasingly more potent accelerators and the use of large compute clusters. However, the gain in prediction accuracy from large models trained on distributed and accelerated systems comes at the price of a substantial increase in energy demand, and researchers have started questioning the environmental friendliness of such AI methods at scale. Consequently, energy efficiency plays an important role for AI model developers and infrastructure operators alike. The energy consumption of AI workloads depends on the model implementation and the utilized hardware. Therefore, accurate measurements of the power draw of AI workflows on different types of compute nodes is key to algorithmic improvements and the design of future compute clusters and hardware. To this end, we present measurements of the energy consumption of two typical applications of deep learning models on different types of compute nodes. Our results indicate that 1. deriving energy consumption directly from runtime is not accurate, but the consumption of the compute node needs to be considered regarding its composition; 2. neglecting accelerator hardware on mixed nodes results in overproportional inefficiency regarding energy consumption; 3. energy consumption of model training and inference should be considered separately - while training on GPUs outperforms all other node types regarding both runtime and energy consumption, inference on CPU nodes can be comparably efficient. One advantage of our approach is that the information on energy consumption is available to all users of the supercomputer, enabling an easy transfer to other workloads alongside a raise in user-awareness of energy consumption.
translated by 谷歌翻译
Objective: Thigh muscle group segmentation is important for assessment of muscle anatomy, metabolic disease and aging. Many efforts have been put into quantifying muscle tissues with magnetic resonance (MR) imaging including manual annotation of individual muscles. However, leveraging publicly available annotations in MR images to achieve muscle group segmentation on single slice computed tomography (CT) thigh images is challenging. Method: We propose an unsupervised domain adaptation pipeline with self-training to transfer labels from 3D MR to single CT slice. First, we transform the image appearance from MR to CT with CycleGAN and feed the synthesized CT images to a segmenter simultaneously. Single CT slices are divided into hard and easy cohorts based on the entropy of pseudo labels inferenced by the segmenter. After refining easy cohort pseudo labels based on anatomical assumption, self-training with easy and hard splits is applied to fine tune the segmenter. Results: On 152 withheld single CT thigh images, the proposed pipeline achieved a mean Dice of 0.888(0.041) across all muscle groups including sartorius, hamstrings, quadriceps femoris and gracilis. muscles Conclusion: To our best knowledge, this is the first pipeline to achieve thigh imaging domain adaptation from MR to CT. The proposed pipeline is effective and robust in extracting muscle groups on 2D single slice CT thigh images.The container is available for public use at https://github.com/MASILab/DA_CT_muscle_seg
translated by 谷歌翻译
In this paper we prove Gamma-convergence of a nonlocal perimeter of Minkowski type to a local anisotropic perimeter. The nonlocal model describes the regularizing effect of adversarial training in binary classifications. The energy essentially depends on the interaction between two distributions modelling likelihoods for the associated classes. We overcome typical strict regularity assumptions for the distributions by only assuming that they have bounded $BV$ densities. In the natural topology coming from compactness, we prove Gamma-convergence to a weighted perimeter with weight determined by an anisotropic function of the two densities. Despite being local, this sharp interface limit reflects classification stability with respect to adversarial perturbations. We further apply our results to deduce Gamma-convergence of the associated total variations, to study the asymptotics of adversarial training, and to prove Gamma-convergence of graph discretizations for the nonlocal perimeter.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译